-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
gh-105699: Use a _Py_hashtable_t for the PyModuleDef Cache #106974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-105699: Use a _Py_hashtable_t for the PyModuleDef Cache #106974
Conversation
@Yhg1s, any objections to this change in 3.12 (pending a review here)? I suspect the current use of dict will eventually cause a crash somewhere and more-so as folks use subinterpreters more. I'm asking here because I'd prefer 3.12 to match 3.13. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this fixes a fundamental issue, I'm okay with backporting it to 3.12rc1, if it can get into a decent shape in time.
Python/import.c
Outdated
goto finally; | ||
} | ||
|
||
PyObject *extensions = EXTENSIONS.dict; | ||
if (extensions == NULL) { | ||
key = hashtable_key_from_2_strings(filename, name, ':'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:
is valid in filenames (on most filesystems). Why not /
, which isn't? Also, since it's used in two places and not easy to miss/misinterpret, it should probably be a #define/static constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An module name (identifier) will never have ":" or "/" in it. I figured the separator didn't matter as long as it wasn't a valid character in identifiers. I thought "/" might be a little confusing when debugging since it will likely be in the filename, so I picked the next character that made sense to me in this context, ":".
If you think there's an advantage to "/" (or any other character), or a disadvantage to ":" or you feel strongly about it, I'd be glad to change the separator. It isn't critical to me.
Python/import.c
Outdated
goto finally; | ||
} | ||
res = 0; | ||
/* This decref would be problematic if the module def were |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That brings up a good point: shouldn't the PyModuleDefs placed in the shared cache be made immortal? Otherwise how can they sensibly be shared at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. We don't use module defs like the usual objects and generally assume they are statically defined, but it would be good to play it safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Co-authored-by: T. Wouters <thomas@python.org>
Thanks @ericsnowcurrently for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12. |
Sorry, @ericsnowcurrently, I could not cleanly backport this to |
…hongh-106974) This fixes a crasher due to a race condition, triggered infrequently when two isolated (own GIL) subinterpreters simultaneously initialize their sys or builtins modules. The crash happened due the combination of the "detached" thread state we were using and the "last holder" logic we use for the GIL. It turns out it's tricky to use the same thread state for different threads. Who could have guessed? We solve the problem by eliminating the one object we were still sharing between interpreters. We replace it with a low-level hashtable, using the "raw" allocator to avoid tying it to the main interpreter. We also remove the accommodations for "detached" thread states, which were a dubious idea to start with. (cherry picked from commit 8ba4df9)
GH-107412 is a backport of this pull request to the 3.12 branch. |
Thanks for the reviews, @Yhg1s! |
…-106974) (gh-107412) gh-105699: Use a _Py_hashtable_t for the PyModuleDef Cache (gh-106974) This fixes a crasher due to a race condition, triggered infrequently when two isolated (own GIL) subinterpreters simultaneously initialize their sys or builtins modules. The crash happened due the combination of the "detached" thread state we were using and the "last holder" logic we use for the GIL. It turns out it's tricky to use the same thread state for different threads. Who could have guessed? We solve the problem by eliminating the one object we were still sharing between interpreters. We replace it with a low-level hashtable, using the "raw" allocator to avoid tying it to the main interpreter. We also remove the accommodations for "detached" thread states, which were a dubious idea to start with. (cherry picked from commit 8ba4df9)
This fixes a crasher due to a race condition, triggered infrequently when two isolated (own GIL) subinterpreters simultaneously initialize their sys or builtins modules. The crash happened due the combination of the "detached" thread state we were using and the "last holder" logic we use for the GIL. It turns out it's tricky to use the same thread state for different threads. Who could have guessed?
We solve the problem by eliminating the one object we were still sharing between interpreters. We replace it with a low-level hashtable, using the "raw" allocator to avoid tying it to the main interpreter.
We also remove the accommodations for "detached" thread states, which were a dubious idea to start with.
(This approach does not have the same potential performance penalty as the linked list we used in gh-106899.)